Skip to content

[VL][Delta] Add native Delta DV reader support#12040

Open
malinjawi wants to merge 3 commits into
apache:mainfrom
malinjawi:split/delta-dv-native-reader-pr
Open

[VL][Delta] Add native Delta DV reader support#12040
malinjawi wants to merge 3 commits into
apache:mainfrom
malinjawi:split/delta-dv-native-reader-pr

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

This PR is the second step in the split Delta deletion-vector (DV) stack, following #12001.

It adds the native Velox-side Delta DV reader layer that consumes the roaring bitmap payload facilities introduced by #12001, without adding the JVM-side Delta scan metadata handoff yet.

Main changes:

  • add a native Delta connector and data source backed by the Hive connector/data source infrastructure
  • register a scoped Delta connector alongside the existing scoped Hive connector for each Velox runtime
  • add Delta split metadata types for:
    • deletion-vector descriptors
    • protocol metadata
    • file statistics used for DV validation
    • serialized split payload buffer views
  • add DeltaDeletionVectorReader to load materialized Delta DV payloads using RoaringBitmapArray
  • add DeltaSplitReader to validate DV protocol/statistics metadata and apply row-index filtering semantics
  • add focused native unit coverage for connector setup, split metadata, and deletion-vector reader behavior

This PR is intentionally native-reader only:

  • no JVM-side Delta scan metadata handoff yet
  • no end-to-end Delta scan offload behavior change yet

Those pieces will be added in follow-up split PRs.

issue #11901.

How was this patch tested?

Added focused native test coverage in:

  • cpp/velox/compute/delta/tests/DeltaConnectorTest.cpp
  • cpp/velox/compute/delta/tests/DeltaSplitTest.cpp
  • cpp/velox/compute/delta/tests/DeltaDeletionVectorReaderTest.cpp

Covered cases:

  • Delta connector configuration and connector properties
  • split-carried deletion-vector descriptors and logical row-count accounting
  • loading materialized DV payloads from RoaringBitmapArray
  • row deletion checks and keep/drop filter decisions
  • empty payload handling and invalid payload rejection
  • protocol/statistics validation for DV-bearing splits

Validation run:

  • fork preview CI against malinjawi/incubator-gluten:main on the combined PR2 branch: all checks passed after rerunning two infra-flaky jobs
  • local git diff --check upstream/main...HEAD
  • local clang-format pass with /opt/homebrew/opt/llvm@15/bin/clang-format over changed C++ files

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions Bot added VELOX CORE works for Gluten Core labels May 5, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

Run Gluten Clickhouse CI on x86

@malinjawi malinjawi force-pushed the split/delta-dv-native-reader-pr branch from 1a16894 to 66ea460 Compare May 7, 2026 09:29
@github-actions github-actions Bot removed the CORE works for Gluten Core label May 7, 2026
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Comment on lines +59 to +79
struct DeltaProtocolInfo {
int32_t minReaderVersion;
int32_t minWriterVersion;
std::optional<std::vector<std::string>> readerFeatures;
std::optional<std::vector<std::string>> writerFeatures;

/// Check if this protocol supports deletion vectors.
/// Returns true if:
/// - minReaderVersion >= 3
/// - minWriterVersion >= 7
/// - 'deletionVectors' is in readerFeatures
bool supportsDeletionVectors() const {
if (minReaderVersion < 3 || minWriterVersion < 7) {
return false;
}
if (!readerFeatures.has_value()) {
return false;
}
return std::find(readerFeatures->begin(), readerFeatures->end(), "deletionVectors") != readerFeatures->end();
}
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case do we need this struct and the validation logics? Won't we only use native DV features when the Java side feature exists?


namespace {

class DeltaConnectorTest : public ::testing::Test {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add some test cases that are more E2E? E.g., read a file with a passed-in DV, then verify whether the unwanted rows are filtered out?

Comment on lines +68 to +70
void loadSerializedDeletionVector(
std::string_view serializedPayload,
std::optional<uint64_t> expectedCardinality = std::nullopt);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments explaining expectedCardinality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants